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Abstract 

Patterns over {—1,0,1} define, by their outer products, partially connected neural net- 
works, consisting of internally strongly connected, externally weakly connected subnet- 
works. The connectivity patterns may have highly organized structures, such as lattices 
and fractal trees or nests. Subpattems over {— 1,1} define the subcodes stored in the sub- 
networks. The network code is defined as the set of permutations of the subcode words, 
one from each subnetwork, that agree in their common bits. It is first shown that the code 
words are locally stable states of the network, provided that each of the subcodes consists 
of mutually orthogonal words or of, at most, two words. Then it is shown that if each of 
the subcodes consists of two orthogonal words, the code words are the unique ground states 
(absolute minima) of the Hamiltonian associated with the network. The regions of attraction 
associated with the code words are shown to grow with the number of subnetworks sharing 
each of the neurons. Depending on the particular network architecture, the code sizes of 
partially connected networks can be vastly greater than those of fully connected ones and 
their error correction capabilities can be significantly greater than those of the disconnected 
subnetworks. The codes associated with lattice-structured and hierarchical networks are 
discussed in some detail. 


*Y. Baram is with the Department of Electrical Engineering, Technion, Israel Institute of Technology, Haifa 
32000, Israel. He is also associated with the NASA Ames Research Center, Moffett Field, CA 94035. This work 
was supported in part by the Technion V.P.R. Fund - Albert Einstein Research Fund and in part by the Director’s 
Discretionary Fund, NASA Ames Research Center. 
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1. Introduction 


Neural networks, defined by sums of outer products of binary vectors, were shown by Hop- 
field (ref. 1 ) to converge to a local minimum of the Hamiltonian associated with the network. The 
final state was shown to be, with high probability, one of the given vectors, provided that they are 
nearly mutually orthogonal and their number does not exceed a certain fraction of the number of 
neurons. While the possibility of disconnected neurons was mentioned by Hopfield in general 
terms, the network has been generally perceived as one in which every neuron is connected to 
all the others. When the stored patterns are selected so as to satisfy orthogonality, they may be 
viewed as a code, the size of which cannot exceed the number of neurons and must be consider- 
ably smaller if a substantial error correction capability is desired. Horn and Weyers (ref. 2) have 
derived conditions under which orthogonal patterns are unique ground states (absolute minima) 
of the Hamiltonian. These states may be reachable by such mechanisms as simulated annealing 
(see Kirkpatrick et al., ref. 3). However, the size of the code allowed by fully connected networks 
is disturbingly small. 

In this paper we consider neural networks defined by outer products of vectors over 
{—1,0,1}, which will be called the stored patterns. The vector of nonzero bits, ordered ac- 
cording to their order of appearance in a pattern, will be called the subpattern associated with 
the pattern. Assigning a neuron to every bit position, each group of subpattems corresponding to 
the same bits defines a subnetwork by the associated neurons and by the interneural connections 
obtained by the sum of their outer products. Since the sum of outer products of {± 1 } vectors is 
likely to produce a significant number of nonzero terms, the subnetworks are said to be internally 
strongly connected. (They are not necessarily fully connected; the exact connectivity pattern will 
be determined by the information in the subpatterns). On the other hand, the connectivity be- 
tween the subnetworks is said to be weak, since they only partly overlap. The network code is 
defined as the set of all permutations of subpatterns, one from each subnetwork, that agree in 
their common bits. We first show that the code words are locally stable states of the network, 
provided that each of the subcodes consists of mutually orthogonal words or of, at most, two 
words. The regions of attraction associated with the code words are shown to grow with the 
number of subnetworks sharing each of the neurons. Then we show that if each of the subcodes 
consists of two orthogonal words, the code words are the unique ground states of the Hamiltonian 
associated with the network. The network structure need not generally be highly ordered. How- 
ever, in order to construct specific codes, some organization must be imposed. Depending on 
the particular network architecture, partially connected neural networks can have considerably 
greater code sizes than fully connected ones. We consider as examples the codes associated with 
networks structured as lattices and fractal trees or nests. 


2. Patterns, Subpatterns, and Subnetworks 


Consider a set of N neurons and a set of M patterns, defined as vectors of dimension N, 
whose components, having a one-to-one correspondence with the neurons, take their values from 
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{—1,0,1}. Let the neurons corresponding to the ± 1 bits of each of the patterns define a subnet- 
work and let the subnetworks be indexed by i = 1 , . . . , L. In general, the subnetworks may be of 
different sizes, but we assume for convenience that the subnetwork size, K, is uniform through- 
out the network. The vector of ± 1 bits of a pattern, arranged in their order of appearance, will be 
called a subpattem. Suppose that there are M, patterns whose subpattems are associated with the 
i’th subnetwork and let them be denoted u> ( ( § , l = 1 , . . . , M,. A matrix of synaptic parameters 
relating the neurons to each other is defined by 


i= 1 1=1 


( 1 ) 


N 


Xi = sign{£ Wij} = 
/=• 


( 2 ) 


The state i, of the i’th neuron is updated asynchronously according to 

f +1 if Ef=, WtjXf > 0 
s< if W tJ x } = 0 

-1 if Ef=i W tJ x } < 0 

The pattern structures, that is, the distribution of the {— 1 , 0 , 1 } values in the bit positions, de- 
termine the connectivity structure of the network, as illustrated by the following examples. 


Example 2.1 The patterns 


4 1 ; = ( + + + + oo) t 

W(2) = (00 + ++ +) r 


= (+-+- 00) r 
w v> = (00+ -+ -) r 


w = 


where + and — represent +1 and —1, respectively, define the six-neuron network, depicted in 
figure 1(a), which consists of two subnetworks of size four, sharing two neurons. The associated 
connectivity matrix 

2 0 2 0 0 0 

0 2 0 2 0 0 

2 0 4 0 2 0 

0 2 0 4 0 2 

0 0 2 0 2 0 

0 0 0 2 0 2 

defines the intemeural connections represented in the figure by solid lines. 

The above example shows that partial connectivity can be created not only by partial over- 
lap between the stored subpattems, but also by the information contents of the subpattems them- 
selves. This is further illustrated by the following example. 


Example 2.2 The subpatterns 


,„(D _ 

w O) 


yield the matrix 


= ( + +++) 


W = 


T and 

w 

(2) _ 
(i) “ 

' 2 

2 

0 

0 

2 

2 

0 

0 

0 

0 

2 

2 

0 

0 

2 
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Figure 1: Partially connected subnetworks for examples 2.1 and 2.2. 


which defines the four-neuron “subnetwork” depicted in figure 1(b). The subpattems 

tU({) = ( + + ++) r and = (+ - +-) T 


yield the matrix 


W = 


' 2 0 2 0 ' 
0 2 0 2 
2 0 2 0 
0 2 0 2 


which defines the square subnetwork depicted in figure 1(c). Similarly, the subpattems 
= (+++++ + ++) T and W(?j = (+- + — + - +-) T 


define the cubical subnetwork depicted in figure 1(d). 

We call the set of subpattems corresponding to a subnetwork a subcode. The code of the 
network is defined as the set of vectors consisting of permutations of subcode words, one cor- 
responding to each of the subnetworks, that agree in their common bits and the converses (or 
negative versions) of these vectors. In example 2.1, the subcode stored in each of the subnet- 
works is {(+ + ++) ,(+ — +—)} and the network code is {( + + + + ++) ,(+ — + — +— ) , ( 

),(- + - + -+)}• 
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3. Local Stability and Attraction 


3.1 Equilibrium States 


It follows from equation (2) that the equilibrium states of the network are the states that 
satisfy the equation 

x = sign{VKa:} = sign{EEMo> x Wo) (3) 

«=i i=i 

where (x, y) = x T y, which implies that if a: is an equilibrium point, so is its converse. Suppose 
that the network is probed by a code word x and let l, denote the index of the subcode word in 
x, corresponding to the i’th subnetwork. Suppose further that the k’th neuron is shared by n* 
subnetworks. Then 

L Mi 

[Wx] k = KrikXk + < 4 > 

i=l l=i 

vu 

Denoting the indices of the subnetworks that share the A:’th neuron i k = 1 it can be seen 

that a sufficient condition for a code word to be an equilibrium state of the network is 


n k Mi 


Km, >|[£E( «'{£). *> 


i*=l 1=1 

If the subnetworks are disconnected, the condition becomes 

M 


K >| [ 52 (w ( ( ‘j,i)u; ( ( ‘J] fc | 


(5) 


( 6 ) 


i=i 

W 


which is stricter than (5). 

A question of interest is whether code words can be guaranteed to be equilibrium points. 
It can readily be seen that when the subpatterns corresponding to each of the subnetworks are 
mutually orthogonal, each of the code words is an equilibrium point of the network, as in this 
case 

(w[ l ^,x) = 0 for all l J U 

and condition (5) is satisfied. Suppose next that each of the subnetworks stores, at most, two sub- 
patterns, associated with w$ and , i = 1 , . . . , L. Let a code word x consist of a permutation 
of these subpattems, one per subnetwork, , i = 1 , . . . , L, l t = 1 or 2 . Then 

[Wx] k = Kn k x k + £ 

>*=1 ■*=! 


Since 

|(«>IS„S) i<* 
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with equality if and only if is equal to or to its converse, (5) is satisfied. It follows 
that x is an equilibrium point. 

We have seen that the code words are equilibrium points of the network if the subcode words 
stored in each of the subnetworks are mutually orthogonal or if their number does not exceed two. 
The question arises whether in these cases the code words are the only equilibrium points. As 
the following example shows, this is not necessarily the case, even when each of the subcodes 
consists of two orthogonal words. 

Example 3 .1 Consider the network of example 2.1. It can readily be verified that the code words, 
xi = (+ + + + ++) T , X 2 = (+ - + -+-) r ,x 3 = —X\ andx 4 = -x 2 are equilibrium points of 
the network. It can also be verified, however, that the pattern x = ( + + + — +— ) r , which differs 
from x 2 by a single bit, is also an equilibrium point, satisfying (3). In the following sections 
it will be shown that such equilibrium states are local, not absolute, minima of the associated 
Hamiltonian. 


3.2 Local Attraction 

We next examine the regions of attraction associated with the code words. Let x denote a 
code word, consisting of the subpatterns corresponding to , i = 1 , . . . , L. A pattern x is in 
the region of attraction of x if 

sign{iyx} = x (8) 

If the network’s state is within the region of attraction of x, then it will converge to the latter with 
probability 1 , provided that the probability of each of the neurons being selected for update on 
each step is nonzero. Suppose first that each of the subnetworks stores two subpattems, which 
may not be orthogonal, corresponding to and wffl , i = 1 L. Indexing, as before, the 
subnetworks sharing the k'th neuron i* = 1 , . . . , n*, and writing 

[Wx] k = 5^(i «(Si) ) »*)[W(5i) > ]* + 

»’*=! ‘k =1 

where l ilt = 1 or 2 , it can be seen that a sufficient condition for x to be in the attraction region of 
x is 

i i > ei (-a.*) i o°) 

**= 1 •**' 

A sufficient condition for (10) is 

> *) > K w 2> . *) l ( 1 

i* = l »'*■' 

Next, suppose that the stored subcode words are mutually orthogonal and that their number 
in each of the subnetworks is uniformly M. Further, suppose that the subpatterns w$,i = 
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1 , . . . , qr , agree in their common bits, forming a code word. Denoting by d[ x, y ] the Hamming 
distance between x and y, we have 


[ Wx] k = £j ( K - 2 d{ u® ,z])[ w{% + E ^ - 2 w (0 > *1)1 t w< i <) ]* ( 1 2 ) 


1=1 


1=1 i* 1 


Since there is no conflict between the fc’th bits of , i = 1 , . . . , n*, it follows from the neural 
update rule (2) that the distances d[ , x] , i = 1 , . . . , n* will decrease (if x k j [ w$] k ) or 
remain the same (if x k = [ ,■)]*) if 

£( K - 2 d[ W( ( J } , x] ) > f: E I K - 2 d[ ^ ( ( g , X] I (13) 

,=1 ,=1 

It follows that, since the stored subpatterns are mutually orthogonal, the maximum possible value 
of | K - 2 d[ u> ( ( jj , x] | is 2 d[ , x] (it is obtained for either d[ , x] = K/2 - d[ , x] 
or d[ , x] = K/2+ d[ , x ] ). This implies that convergence to the code word in question 
can be guaranteed if 


£( K - 2 d[ , x] ) > Y, 2( M - 1) d[ w{$ ,x] (14) 

i= 1 1=1 


or 




-8P. 


x ] < 


i=i 

For disconnected subnetworks, a sufficient condition 


r*K 

2M 

is 


(15) 


K - 2 d[ w{§ , x] > 2( M - 1) d[ , x] 
yielding, for each of the subnetworks, 

d[w{^,x] < for i=l,...,n* (16) 

which is more restrictive than (15). The number of mutually orthogonal subpattems that can be 
stored in a subnetwork is restricted, of course, by the subnetwork size. It can be seen from (15) 
and (16) that the maximal guaranteed regions of attraction are obtained for M = 2 (disregarding 
the trivial case M = 1 ). 


We have seen that when the stored subpattems corresponding to the subnetworks are mutu- 
ally orthogonal, or if their number for each subnetwork does not exceed two, the code words are 
equilibrium points of the network, with regions of attraction that grow with the number of sub- 
networks sharing each of the neurons. The maximal guaranteed region of attraction is obtained 
for two orthogonal subpattems corresponding to each of the subnetworks (excluding the trivial 
case of a single subpattem per subnetwork). It should be emphasized that even in this case there 
may be spurious local attractors, which are not code words. 
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4. Ground States 


Suppose that the subpatterns stored in each of the subnetworks are mutually orthogonal and 
consider the Hamiltonian 


H( x) = — x T Wx 

(17) 

which may be written as 

= S .*) 2 

i=l 1=1 

(18) 

Noting that, for any code word x 

(w\ij,x) = K 

(19) 

we obtain 


H{x) = —K 2 L 

(20) 

Let 

~(l) _ ( W (ij ’ X) (l) 1 ( (i) ^ (1) 

I(i > = iK.ijip ”’«» = 

(21) 


where ||x|| 2 = (x,x) , denote the orthogonal projection of an arbitrary pattern x on We 
have 

L Mi 

w r s s EE(®S' w !o ( 21) 

i=l 1=1 

Substituting (21) into (22) yields 

Wx = K E5>“ < 23 > 

i=l t=i 

Let us denote by X(,-> the part of x corresponding to the t’th subnetwork and by Si the sub- 
space spaned by the subpattems corresponding to this subnetwork, that is. Si = span{u;( ,•] , l = 
1 , . . . , Mi}. The orthogonal projection of x on Si is 


Mi 

- _ - ( l) 

*<o = 2^ x (i> 

t=i 


(24) 


It can be seen that 

(x,x (i) ) = (x(q,X(i)) < K (25) 

with equality if and only if X(,) = X(,), which is the case if and only if X(,> belongs to Si. It 
follows that 


yielding 


1=1 

H(x) > -K 2 L 


(26) 

(27) 


with equality if and only if X( t ) belongs to Si for all t = 1 ,... ,L. We have thus shown that an 
arbitrary state x has a minimal Hamiltonian value if and only if all its parts corresponding to the 
subnetworks, x^), belong to the corresponding subspaces Si, i = 1 ,...,L. 
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Suppose that each of the subnetworks stores only two orthogonal words. Further suppose 
that x has minimal H{ x ) , hence that x (i) belongs to S', for all i. We next show that this implies 
that x is a code word. Since the subcode words stored in each of the subnetworks agree in at least 
one bit and differ in at least one bit, and since Z(,), whose components are ±1 , belongs to S',-, 
there exist two bits corresponding to the i’th subnetwork, say, the k'ih and the ra’th, and scalars 
ci and C 2 such that 

[ciw ( ( J] fc +[ca«;g] fc = ±l (28) 

and 

[ Cl w[ • ) ) ] ro + [ c 2 ]m = ± 1 (29) 

yielding 

ci + c 2 = ±1 (30) 

and 

ci — c 2 = ±1 (31) 

However, taking the squares of both equations and subtracting, we obtain 

cic 2 =0 (32) 

implying that either ci = 0 and c 2 = 1 , or c 2 = 0 and ci = 1 . But this means that Z(,> is equal 
to either w$ or . Hence x is a code word. We have thus proven the following results. 

Theorem Suppose that each of the subnetworks stores two orthogonal subpattems. Then the 
only states of minimal H are the code words. 

As in the case of fully connected networks (ref. 1), the Hamiltonian is nonincreasing along 
any trajectory in the state space, due to the symmetry of W and the asynchronous neural update 
rule. This, however, does not imply that a ground state will be reached by direct convergence 
from any initial state, as there can be local minima, corresponding to higher values of the Hamil- 
tonian. A ground state will be attained by direct relaxation if the initial state falls within its region 
of attraction. This would be the case if the initial state represents a sufficiently close approxima- 
tion of a code word. For larger errors, more complex mechanisms, such as simulated annealing 
(see Kirkpatrick et al., ref. 3) may be necessary for reaching the ground states. 


5. Some Structural Codes 


So far, no assumptions have been made on the network’s connectivity structure, which is 
generally determined by the stored patterns. Highly structured networks allowing only certain 
connections between neurons may result from the storage of similarly structured patterns, or 
may be given prior to the storage of information. In the latter case, the stored information will 
impose a structure within the given structure. It should be emphasized that for all purposes of 
this paper, the network structure merely defines the connections between neurons and between 
subnetworks, which may be physically performed by fibers taking any geometric form or no 
uniformly prescribed form at all. Specific geometries would be meaningful, however, in the 
physical construction of neural networks. 
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5.1 Lattice Networks 


We consider a network of N neurons, grouped into subnetworks of K neurons each, whose 
internal connections are to be determined by the stored information. The relationship between 
the subnetworks is defined by a lattice structure, in which the subnetworks form the building 
blocks or the fundamental units (see, e.g., Conway and Sloan, ref. 4). In order to characterize the 
code of a lattice neural network (not to be confused with “lattice codes,” which are the centers 
of the fundamental units, ref. 4), we form a set of “Chinese boxes” as follows. Select some 
subnetwork to be the innermost, or the first box. This subnetwork also defines the first shell. 
The subnetworks directly connected to it define the second shell. Together, the first box and the 
second shell define the second box. The subnetworks directly connected to the second box form 
the third shell and together they define the third box, etc.. Let us denote the subcode for the ;’th 
subnetwork of the k ' th shell c* and the code corresponding to the k'th box Ck ■ Denoting by r k a 
word of Ck, by r kik its part in the (A: — 1) ’th box and by r k its part in the k’ th shell, let c*( ) be 

the code allowed by r k ,k in the y’th subnetwork of the A: ’th shell. Denoting by L k the number of 
subnetworks in the k ' th shell, the set of permissible permutations of words in the subnetworks of 
the A’th shell associated with r k ,k is given by the Cartesian product <8)}*jC*(r fci *) . The box codes 
may be defined progressively by 

C k +i = {r*;+i : r k +\ t k+\ £ Ck, r*+i £ ®^=i c }( r *,*)} (33) 

with Ci = c\ , as the innermost box consists of a single subnetwork. Denoting by M* the number 
of subwords in c*( r k ,k ) , and by Q the number of shells, the size of the network code is given by 

M = ng,n£*,Mf (34) 

These expressions provide a general characterization of lattice neural codes. More concrete for- 
mulations will require specifying both the particular lattice structure and the subcodes stored in 
the subnetworks. The following examples, which involve networks structured as lattices in the 
plane with relatively small subnetworks, illustrate that certain lattice structures can yield larger 
codes than others. 

Example 5.1 Consider the diamond (or “checkerboard”) lattice network depicted in figure 2. It 
is not difficult to see that the addition of the Q’th shell to the (Q - 1) ’th box adds 4(Q - 1) 
subnetworks, which, in turn, adds 8Q — 4 neurons (3 x 4 + 2(4(Q — 1) — 4)). The size of 
a network of Q shells is readily obtained as N = 4 Q 2 . Suppose that a subcode consisting of 
the two subwords, ( + + ++) , (+ — +— ), is stored in each of the subnetworks. It can be seen 
that the addition of a shell to the network multiplies the code size by 16 (there are two possible 
subwords for each of the four corner subnetworks; the words in the other subnetworks of the 
shell are determined by the previous box). The code size for a network of Q shells is, then, 
M = 4 x 16 Q-1 (including converses). The ratio of code size to network size 

160- 1 


is a monotone increasing function of Q. For the given three-shell network, the code consists of 
4 x 16 2 = 1024 words, two of which are shown in the figure. 
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Figure 2: Two of the 1024 code words of a three-shell diamond lattice network storing the sub- 
code {( + + ++) ,(+ — +—)} in each of the subnetworks. 

Example 5.2 Now consider the circular lattice of figure 3. Suppose that the subcode to be stored 
in each of the subnetworks is {(+ + + + ++) ,(+- + -+-)}. It is not difficult to see that the 
first two shells uniquely determine the neural state values for the rest of the network and that the 
network code size cannot be greater than eight (including converses), regardless of the network 
size. The reason for this finite code size is the tight packing of the lattice, which causes each 
subnetwork to share pairs of neighboring neurons with pairs of neighboring subnetworks. Half 
the code is shown in the figure (the converses of the shown words constitute the other half). 

Example 5.3 In order to examine the error correction capability of a lattice network, we consid- 
ered the portion of the network of figure 3, consisting of the seven inner circular subnetworks. 
The subcode {( + + + + ++),(+ — + — + — ) } was stored in each of the subnetworks. Each of 
the eight code words was corrupted by errors, so that the probability of each of the bits having a 
reversed sign was p. Fifty corrupted versions of each of the eight code words were generated and 
presented to the network, which was allowed, for each such probe, to relax to a final state. The 
final error for each of the probes was calculated as the Hamming distance {x — Xf) T {x — x/) /4 , 
where x is the corresponding uncorrupted code word and x/ is the corresponding final state. The 
average error was calculated for the entire set of 400 words. The experiment was then repeated 
for a network consisting of the disconnected subnetworks (seven subnetworks consisting of six 
neurons each), storing the same subcode and probed by the same code words as the lattice net- 
work. The results for several values of p are shown below, where e; <,«,•« denotes the average error 
for the lattice network and e<ic the average error for the network of disconnected subnetworks. 

p: 0.0 0.1 0.2 0.3 0.4 0.5 

ei a ttice : 0.0000 0.2000 1.3400 4.3400 8.1600 11.5400 

edc : 0.0000 1.3797 4.2602 9.0601 14.6398 21.4802 

It can be seen that, although the lattice network consists of less neurons (30) than the network of 
disconnected subnetworks (42), it has a considerably higher error correction capability. 
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Figure 3: Half the code of a circular lattice network storing the subcode {( + + + + ++),( + 
+ -+-)}• 
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Figure 4: Half the code of the hexagonal lattice network storing the subcode {(++++++),(+ — 
+ -+-)}• 

Example 5.4 Consider the hexagonal lattice network of figure 4. It can be seen that, if the 
subcode {( + + + + ++),(+ — + — +— ) } is stored in each of the subnetworks, there are only four 
code words, the first of which has (++++++), the second ( + — + — +—) in all the subnetworks, 
and the other two are the converses of the first. Half the code is shown in the figure. This code 
size is a consequence of the fact that every two neighboring subnetworks share two neighboring 
neurons. Independently of the lattice geometry, every network having this property will have 
code size four. 

As the above examples indicate, the code size is largely affected by inter-subnetwork rela- 
tions within the lattice structure. While the last two examples present cases of limited code sizes, 
the first example presents a case in which the code size increases with the network size. An in- 
crease of the subnetwork size, and thereby of the error correction capability, can be achieved in 
this case by extending the lattice structure to higher dimensions (see, e.g., Conway and Sloane, 
ref. 4). A three-dimensional extension of the diamondal lattice is the cubical lattice depicted 
in figure 5. It consists of eight neurons in each subnetwork and its code size for Q shells is 
4 x 256 O' 1 . 


5.2 Hierarchical Structures 

Information structures or physical development processes may give rise to hierarchical net- 
work architectures. Figure 6(a) depicts a network having a “fractal” (see Mandelbrot, ref. 5) tree 
structure. In this network, which consists of layers of subnetworks of the same number of neu- 
rons, each subnetwork in layers above the lowest consists of neurons of a lower layer, one from 
each subnetwork. Subsequently, each neuron in these layers is a member of two subnetworks 
belonging to two neighboring layers. The subnetworks at each layer may also be connected into 
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Figure 5: A three-dimensional checkerboard cubical structure. 

a lattice structure. In the network depicted in figure 6(b), where only the neurons of a single 
subnetwork in each layer are shown, each neuron at a given layer is also a member of all the 
layers beneath it. Hence, the entire network consists of the neurons of the lowest layer and the 
structure is defined solely by the interneural connections. Since higher layers are nested in lower 
ones, we call the structure a nested network. The following will apply to both types of structures. 

In order to characterize the code of a hierarchical network, we index the layers from top to 
bottom and define the k’th pyramid as the structure consisting of the k top layers. The code con- 
struction is completely analogous to that for lattice networks, when analogies are drawn between 
shells and layers and between boxes and pyramids. Let us denote the subcode for the subnetworks 
of the k'th layer c* and the code corresponding to the fc’th pyramid Ck ■ We shall assume that the 
symbols in each word of c* and of Ck are arranged in some order, forming vectors of appropriate 
dimensions. In the hierarchical structure, one of the bit positions of any subcode word position 
in the A; + 1 ’st layer is also shared by a subcode word position of the k'th layer. We shall assume 
for simplicity that this bit position, which will be called the “common” position, is the same for 
all the subcode word positions of a given layer. Since the words in the k + 1 ’th layer must agree 
with those in the A:’th layer in the common bit position, only certain permutations of the subcode 
words of the different layers are permissible. Denoting by r k a word of Ck, by its part in the 
k — 1 ’th pyramid, and by r\ its part in the k'th layer, let c*( be the code allowed by in the 
/’th word position of the k'th layer. The set of permissible permutations of words in the word 
positions of the A: ’th layer associated with r* * is given by the Cartesian product ®j*ic£(r* ( *)» 
where L* is the number of subnetworks in the k'th layer. The pyramidal codes may be obtained 
recursively by the operation 

Ck + 1 ^ {r/t+i ’. rjt + i * + i 6 Ck, (E ®^*iC*(rjtfc)} (35) 

with ri = ci . Denoting by M* the number of words in c*, the size of the code is given by 

M = n*,n/ = *,M* (36) 
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(a) 


(b) 


Figure 6: Tree (a) and nested (b) network architectures. 

where K denotes the number of layers in the network. Both (35) and (36) have the same forms as 
the corresponding expressions for the lattice networks. It should be noted that when the subnet- 
works consist of odd numbers of neurons, the associated subpattems can only be made nearly, not 
strictly, orthogonal (their inner products would produce 1 instead of 0). For large subnetworks, 
this slight diversion from the orthogonality assuption is not expected to result in a significantly 
different network performance. Relatively small subnetworks may be restricted to even sizes in 
order to maintain the orthogonality requirement. 

Example 5.5 Consider the network depicted in figure 7. It consists of two layers of subnetworks 
of six neurons each, placed at the corners and the centers of pentagons, with the center neurons at 
the lower layer being shared with the subnetwork of the higher layer. For graphical clarity, only 
some of the intemeural connections within the subnetworks are shown. Each of the subnetworks 
stores the subcode 

+ +++++ 

+ — + — +— 

in which the first bit corresponds to center (common) neurons. It can be seen that the only 
permissible subcode word in the upper layer is ( + + + + ++) and that the code size is 2 6 = 64 . 
Ten corrupted versions of each of the code words were generated, with an error probability p per 
bit. The network was probed by each of these 640 words and allowed to relax to a final state. 
The average error was calculated, as in example 5.3, for several values of p. The experiment was 
then repeated for a network consisting of the disconnected subnetworks of the lower layer. The 
results are given below, where e« S ( and denote the average errors for the nested network and 
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Figure 7: A two-layer nested network for example 5.5. 


for the network of disconnected subnetworks, respectively. 

p: 0.0 0.1 0.2 0.3 0.4 0.5 

e mst : 0.0000 0.8378 2.8004 5.0196 6.6997 8.2418 

ejc : 0 .0000 0 .6300 3 .6900 9.1800 14 .2200 17 .8200 

It can be seen that, for substantial errors, the nested network has a significantly higher error- 
correction capability than the network of disconnected subnetworks. 


6. Conclusion 


Outer products of vectors over {—1,0, 1 } define partially connected neural networks, con- 
sisting of subnetworks corresponding to the nonzero bits. When each of the subnetworks stores, 
at most, two mutually orthogonal subpatterns, the code words, defined as the permutations of the 
±1 subpattems that agree in their common bits, are the unique ground states of the associated 
Hamiltonian. These states can be reached by direct relaxation, if the initial state falls within their 
regions of attraction, or, otherwise, by such mechanisms as simulated annealing. Specific codes 
may be constructed by choice of the network structure and the subcodes corresponding to the 
subnetworks. 


16 



References 


1. Hopfield, J. J.: Neural Networks and Physical Systems with Emergent Computational 
Abilities. Proc. Nat. Acad. Sci. USA, vol. 79, Apr. 1982, pp. 2554-2558. 

2. Horn, D.; and Weyers, J.: Hypercube Structures in Orthogonal Hopfield Models. Physical 
Review A, vol. 36, no. 10, Nov. 1987, pp. 4968-4974. 

3. Kirkpatrick, S.; Gelatt, C. D., Jr.; and Vecchi, M. P.: Optimization by Simulated Annealing. 
Science, vol. 220, no. 4598, May 1983, pp. 671-680. 

4. Conway, J. H.; and Sloan, N. J. A.: Sphere Packings, Lattices and Groups. Springer Verlag, 
New York, 1988. 

5. Mandelbrot, B.: The Fractal Geometry of Nature. W. H. Freeman and Co., New York, 
1983. 


17 



Report Documentation Page 

Space Administration 

1. Report No. 2. Government Accession No. 

NASA TM- 102239 

3. Recipient’s Catalog No. 

4. Title and Subtitle 

Ground-State Coding in Partially Connected Neural Networks 

5. Report Date 

October 1989 

6. Performing Organization Code 

7. Author(s) 

Yoram Baram* 

8. Performing Organization Report No. 

A-89256 


9. Performing Organization Name and Address 

Ames Research Center 
Moffett Field, CA 94035 

12. Sponsoring Agency Name and Address 

National Aeronautics and Space Administration 
Washington, DC 20546-0001 


10. Work Unit No. 

505-67-21 

1 1 . Contract or Grant No. 


13. Type of Report and Period Covered 

Technical Memorandum 

14. Sponsoring Agency Code 


15. Supplementary Notes 

Point of Contact: Dr. Leonard Tobias, Ames Research Center, MS 210-9, Moffett Field, CA 94035 
(415) 694-5430 or FTS 464-5430 

*Permanent Address: Department of Electrical Engineering, Technion, Israel Institute of Technology, 
Haifa 32000, Israel. 

1 6. Abstract 

Patterns over {-1, 0, 1 } define, by their outer products, partially connected neural networks, consisting 
of internally strongly connected, externally weakly connected subnetworks. The connectivity patterns may 
have highly organized structures, such as lattices and fractal trees or nests. Subpatterns over {-1,1} define 
the subcodes stored in the subnetworks. The network code is defined as the set of permutations of the 
subcode words, one from each subnetwork, that agree in their common bits. It is first shown that the code 
words are locally stable states of the network, provided that each of the subcodes consists of mutually 
orthogonal words or of, at most, two words. Then it is shown that if each of the subcodes consists of two 
orthogonal words, the code words are the unique ground states (absolute minima) of the Hamiltonian 
associated with the network. The regions of attraction associated with the code words are shown to grow 
with the number of subnetworks sharing each of the neurons. Depending on the particular network 
architecture, the code sizes of partially conn ected netw orks can be vastly greater than those of fully 
connected ones and their error correction' capabilities can be significantly greater than those of the 
disconnected subnetworks. The codes associated with lattice-structured and hierarchical networks are 
discussed in some detail. 

17. Key Words (Suggested by Author(s)) 18. Distribution Statement 

Neural networks Unclassified-Unlimited 

Coding 

Associative memory Subject Category - 63 


19. Security Classif. (of this report) 
Unclassified 


20. Security Classif. (of this page) 

Unclassified 


21. No. of Pages 
18 


22. Price 

A02 


NASA FORM 1626 OCT86 


For sale by the National Technical Information Service, Springfield, Virginia 22161 




